Car Speech Enhancement Using Microphone Array Beamforming and Post Filters
نویسنده
چکیده
This paper presents a speech enhancement method to suppress the car noise interference. A linear microphone array is adopted for far-talking speech acquisition and delayand-sum beamforming noise reduction. We exploit an effective time delay estimator using the coherence function between the reference microphone and the beamformed speech. To further enhance the beamformed speech, we design an improved Wiener filter where the resulting noise correlation in microphone array is relatively small so as to approach the optimal Wiener filtering performance. Also, due to the serious degradation in low frequency car speech, we develop a spectral weighting function to compensate the low frequency filtering. These two processing units serve as the post filters to attain desirable performance. In the experiments on microphone array speech in presence of real car noises, we find that the proposed algorithm works well on producing the enhanced speech. The combined delay-and-sum beamformer and two post filters obtain the best results compared to other methods. INTRODUCTION According to new traffic regulation, it is forbidden to make a cellular phone call via hands while driving on the roads. This regulation is enforced in many countries. To this end, it is necessary to develop a robust hands-free far-talking speech acquisition system for drivers so that the desirable performance of speech recognition is achieved. However, in car environments, so many noise interferences including engine, air conditioner, wind, music, babble, echoes, etc., make the signal-to-noise ratio (SNR) of speech signal relatively low. The speech acquisition system using microphone array is feasible to remove the distortion of car noises and reverberations. By adopting the delay-and-sum beamformer, the noises in microphones are smoothed and depressed. We endeavor to develop a microphone array based speech enhancement algorithm to improve the flexibility of human-car communication system. In the literature, the microphone array was applied for location estimation [5][6], and noise reduction, etc. Due to the power of beamforming, the microphone array has been recognized as a crucial speech enhancement approach. Delay-and-sum beamformer is designed to enhance the noisy speech via synchronizing the time delay between microphones. Without accurate time delay, various microphone signals are disturbed with each other. Carter [2] used the maximum likelihood theory to estimate time delay. Omologo and Svaizer [5] sought the time delay with the maximum crosspower-spectrum phase (CSP). Further, the post filters for delay-and-sum beamformer are helpful to improve speech quality a great deal. Zelinski [7] applied the adaptive Wiener filter as the post processor to track time-varying statistics in the desired speech signal. In this paper, we develop a new coherence function to detect discrete delay point for delay-and-sum beamforming. The coherence function measures the degree of correlation between two signals. This function was estimated [2] and acted as a filter for noise reduction [1]. Here, we adopt the coherence function between the reference microphone and the beamformed speech to find the delay point. The resulting beamformed speech produces the maximum coherence function. In addition, we present two post filters to significantly increase SNR of the acquired speech. One is based on an improved Wiener filter and the other is a spectral weighting function. Conventionally, the optimal Wiener filter was derived assuming the between-microphone noise components are mutually uncorrelated. In realistic environments, this assumption is problematic to obtain optimal filtering performance. We develop a new Wiener filter using the beamformed speech and the reference microphone speech as the input signals. The noise correlation is reduced so as to approach Wiener filter optimum. Also, a spectral weighting function is applied to depress the nonspeech segments and enhance the speech segments with the formants concentrating in low frequency region. It serves as a compensator for car speech enhancement. Our experiments show that the combined beamforming and postfiltering approach achieves higher SNR’s and recognition rates compared to baseline and individual methods. Chien et al. Car Speech enhancement Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after full review page 569 TIME DELAY ESTIMATION System overview In Figure 1, we illustrate the overall architecture of the microphone array speech enhancement system. The system contains a linear microphone array of M sensors with equal distance d cm and digital signals } , , 1 ), ( { M i m xi L = , which are used for delay-and-sum beamforming speech acquisition. When the microphone ) (m xi acquires the plane wave, the wave will proceed with a distance to arrive the adjacent microphone ) ( 1 m xi+ . If the sound velocity is c cm/sec and the speech signal is sampled at frequency s f Hz, the maximum discrete delay is calculated by c f d s max ) ( ⋅ = τ . To find the most likely delay point among max max τ τ τ ≤ ≤ − ˆ , we calculate the coherence function between the reference microphone signal ) (m xr and the beamformed signal ( ) ∑ = − + = M i i i m x M m x 1 ) 1 ( 1 ) , ( τ τ , (1) given a presumed delay τ . The first microphone is referred as the reference microphone, i.e. ) ( ) ( 1 m x m xr = . All microphone signals are synchronized and beamformed together. While calculating the coherence function, the time signals ) (m xr and ) , ( τ m x are converted to their spectral correspondences ) ( f X r and ) , ( τ f X via FFT. The most likely delay point τ̂ is determined to obtain the beamformed spectral signal ) ˆ , ( τ f X . In the second part of postfiltering, we enhance the beamformed signal ) ˆ , ( τ f X by an improved Wiener filter ) ( f H w and a spectral weighting function ) ( f H s on a frame-by-frame basis. The signals ) ( f X r and ) ˆ , ( τ f X act as the inputs for postfiltering. A filter selector combining two post filters is presented to determine the appropriate filter ) ( ˆ f H so as to obtain the enhanced spectral signal ) ( ˆ f S . At last, IFFT is employed to restore the spectral signal ) ( ˆ f S to time signal ) ( ˆ t s . This completes the microphone array speech enhancement procedure. Figure 1. System diagram of the proposed speech enhancement. Time delay estimation using coherence function The coherence function of frequency f between two wide-sense stationary random processes ) (m x and ) (m y is written by [2] FFT Delay Estimation Using Coherence Function Spectral Weighting Function Improved Wiener Filter
منابع مشابه
Microphone array post-filtering using supervised machine learning for speech enhancement
High level of noise reduces the perceptual quality and intelligibility of speech. Therefore, enhancing the captured speech signal is important in everyday applications such as telephony and teleconferencing. Microphone arrays are typically placed at a distance from a speaker and require processing to enhance the captured signal. Beamforming provides directional gain towards the source of intere...
متن کاملA generalized estimation approach for linear and nonlinear microphone array post-filters
This paper presents a robust and general method for estimating the transfer functions of microphone array post-filters, derived under various speech enhancement criteria. For the case of the mean square error (MSE) criterion, the proposed method is an improvement of the existing McCowan post-filter, which under the assumption of a known noise field coherence function uses the autoand cross-spec...
متن کاملOptimum Microphone Array for Hands-free Devices in a Car
Nowadays, hands-free devices for cars is a hot technology that evolves very fast, providing new and advanced features to the users. However, the speech quality in such devices can be significantly degraded by different types of noise and interfering signals, having stronger impact in the person listening the speech acquired inside the car. Microphone arrays enable noise reduction by means of be...
متن کاملMicrophone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition
This paper presents a new microphone-array post-filtering algorithm for distant speech recognition (DSR). Conventionally, post-filtering methods assume static noise field models, and using this assumption, employ a Wiener filter mechanism for estimating the noise parameters. In contrast to this, we show how we can build the Wiener post-filter based on actual noise observations without any noise...
متن کاملApplication of Mvdr Beamforming to Spherical Arrays
The minimum variance distortionless response (MVDR) beamforming technique is applied to a spherical microphone array. Therewith optimal spatial filters are calculated. Besides, a freely chosen measure of stability in the calculation facilitates a tradeoff between directivity and noise sensitivity. The beamforming method is compared to the highly directive method of phase-mode processing and the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002